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Abstract 

Education leaders and much literature exhort teachers and school leaders to use data more often and 
more effectively to guide planning and decision making - called, “data driven decision making.” This 
term is ubiquitous in literature and reform discourse, but "on the ground,” so to speak, practitioners 
face significant challenges in analyzing, understanding, and applying data to improve practice. Obsta¬ 
cles faced by practitioners include insufficient expertise, tools, and time; also, organizational cultures in 
schools generally create few incentives for data analysis and as often as not sustain norms inimical to 
the collaboration and collective action required for data driven decision making. The case reported here 
illustrates these challenges through the actions of a principal identifying a problem of organizational cul¬ 
ture and instructional practice and leading an initiative to promote collaboration, analysis, and reflection 
to help improve writing instruction in his school. 
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1 Sumario en espanol 

Los lideres de la education y mucha literatura exhortan que maestros y educan a lideres para utilizar los 
datos mas a menudo y mas indicar efectivamente planeando y la toma de decisiones - llamado, "los datos 
manejaron la toma de decisiones". Este termino es ubicuo en el discurso de la literatura y la reforma, 
pero "en el suelo," tan hablar, los facultativos encaran desafios significativos a analizar, a la comprension, 
y a aplicar los datos para mejorar la practica. Los obstaculos encarados por facultativos incluyen pericia 
insuficiente, las herramientas, y el tiempo; tambien, las culturas orgauizativas en escuelas crean generalmente 
pocos estlmulos para el analisis de datos y tan a menudo como no sostiene normas hostiles a la colaboracion 
y action colectiva requirio para datos manejo la toma de decisiones. El caso informado aqui ilustra estos 
desafios por las acciones de un director que identifies un problema de la cultura organizativa y la practica 
y de dirigir instructional una iniciativa para promover colaboracion, el analisis, y la reflejo para ayudar a 
mejorar escribiendo instruction en su escuela. 

note: Esta es una traduction por computadora de la pagina web original. Se suministra como 
information general y no debe considerarse completa ni exacta. 

2 Introduction 

Education leaders and much literature exhort teachers and school leaders to use data more to guide planning 
and decision making - called, “data driven decision making.” A substantial literature has emerged with 
theoretical models and practical prescriptions. Yet typical practice as shown by empirical studies still falls 
well short of theory-based conceptions embraced by scholars and reformers. Practitioners still face many 
challenges in analyzing, understanding, and applying data to improve practice. 

This case illustrates an application of data analysis in service of standards-based instruction. A school 
principal is concerned about variable academic standards in his school, particularly in literacy instruction. 
Teachers have avoided for the most part collaborative planning and there has been little scrutiny or discussion 
of practice. Seeking a mechanism to change this culture, he organizes a benchmarking activity to examine 
and rate student writing in 5 th grade - an activity he hopes will stimulate teacher conversations about 
writing, help create greater consistency among teachers in assessing writing, and show that instruction can 
and should be subject to systematic empirical inquiry. 

This module strengthens knowledge and skills in using and analyzing assessment data and applying data 
analysis toward the aim of standards-based writing instruction. 

3 Notes For Use As An Instructional Module 

Section I. presents theory and research on data-based decision making and school leadership; the first part 
provides background on data-based decision making and the second part discusses challenges of school 
leadership aimed at supporting data inquiry to improve practice. Section I. can be read and discussed with 
or without supplementary reading (see reference list). Students should discuss ways to connect data with 
writing instruction, trying to be specific about how to achieve what they propose (e.g., how would you 
actually do this in your school?). 

The first part of Section II. presents the methods and the case - an account of one school’s approach 
to data analysis through a benchmarking activity in writing. The principal is concerned about a multi-year 
pattern of mediocre writing assessment results at his school. He wants to stimulate inquiry into practice and 
motivate change. This case describes a data-based benchmarking process. 

This module can be read in its entirety followed by discussion, or the discussion leader can treat each 
section separately. 

The second part of Section II. presents the analyses and results. The discussion leader should review each 
table and figure thoroughly so students understand each one. Some are just descriptive snapshots requir¬ 
ing little interpretation; others contain more information, require more interpretation, and raise discussion 
questions about their implications or limitations. 
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Section III. presents discussion questions and exercises to deepen students’ understanding of the data 
(quantitative literacy) and to reflect on implications for leadership, professional development, and instruc¬ 
tional change. 

Section IV. provides notes for the instructor on the questions and exercises of Section III. as well as a 
rubric to help evaluate the module’s largest assignment: proposing an analysis to supplement or expand on 
the analysis depicted in the case. 

4 Section I. 

5 Background: Theory and Research 

From US Secretary of Education Arne Duncan addressing the Fourth Annual IES Research Conference (IES, 
2009), “Robust Data Gives Us The Roadmap to Reform:” 

5.1 

I am a deep believer in the power of data, to drive our decisions. Data gives us the roadmap to reform. It 
tells us where we are, where we need to go, and who is most at risk.. ..We will ask millions of teachers to 
use student achievement data and annual growth data to drive instruction and evaluation. 

Secretary Duncan’s hope for data-driven schools is shared by many (Bernhardt, 2004; Boudett, City, & 
Murnane, 2005; DQC, 2009; Kowalski, Lasley, & Mahoney 2008; Mills, 2007). A large literature has emerged 
on data based decision making along with annual conferences (e.g., DQC, 2009; MIS, 2010) and a variety of 
foundation-sponsored initiatives around the country helping strengthen districts’ data systems and personnel 
training. Media accounts with headlines like, “Data-driven schools see rising scores,” (Hechinger, 2009) help 
fuel high hopes for data driven decision making as do portrayals of model schools or districts (Henke, 2005; 
Dattnow, Park, & Wohlstetter, 2007; Zavadsky & Dolejs, 2007). There is no doubt that the role of data in 
teachers’ and principals’ practice has grown over the last decade along with improvements in data quality 
and data access technologies. 

As Kerr, Marsh, Ikemoto, Darilek, and Barney (2006, p. 498) note, there are high expectations and much 
potential for data use, and many ways data can be brought to bear on planning and practice in schools. 

5.2 

Most commonly, data are used for tasks such as setting annual and intermediate goals as part of the school 
improvement process. Data may also be used to visually depict goals and visions, motivate students and staff, 
and celebrate achievement and improvement. Schools use data for instructional decisions such as identifying 
objectives, grouping and individualizing instruction, aligning instruction with standards, refining course 
offerings, identifying low-performing students, and monitoring student progress. School structure, policy, 
and resource use may be informed by data. Schools have also used data for decisions related to personnel, 
such as evaluating team performance and determining and refining topics for professional development (see, 
e.g., Bernhardt 2003; Choppin 2002; Feldman and Tung 2001; Mason 2002; Supovitz and Klein 2003). 

A persisting gap exists between theory and typical practice as shown by empirical studies (Bruner et 
ah, 2005; Coburn & Talbert, 2006; Coburn, Toure, and Yamashita, 2009; Ingram, Louis, & Schroeder, 
2004; Means, Padilla, & Gallagher, 2010; Wayman 2005). Most schools still find significant challenges in 
trying to use data effectively. It is not easy to transform teachers’ roles that have never in the history of 
the profession entailed widespread expectations of proficiency and participation in data analysis to examine 
practice, evaluate outcomes, and guide decision making and planning - especially in collaborative groups as 
commonly espoused today. 

The challenge, for the most part, is not lack of data. It is not that districts do not have data. Indeed, 
modern data collection and information technology have filled districts’ databases with test scores, grading 
records, conduct records, health information, demographic data, student transcripts, personnel records, 
finance and budgeting data, survey data, parent information, and more. Districts, generally speaking, have 
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data. What districts generally don’t have are enough staff in each school with the needed expertise, initiative, 
and tools to turn data into actionable information. 

Part of the challenge is capacity and part of the challenge is school culture. Capacity issues include limited 
expertise, data access, analytical tools, and available time. Beyond these issues of capacity, are school culture 
issues: many teachers are ambivalent about examining their practice with objective data. While no single 
“attitude” characterizes all teachers’ stance toward data, many teachers are apprehensive about the prospect 
of spotlighting classroom results and many do not fully understand uses of data for guiding instructional 
planning and evaluation. In any given school, if enough staff have skeptical or resistant attitudes, it will be 
challenging for the school’s leaders to build what is widely advocated in the literature: a culture of inquiry, 
reflection, and collaboration in which data plays a key role in planning and decision making (Zavadsky & 
Dolejs, 2007; Newman, 2006). 

School leadership is the key to building a staff culture willing to examine practice and collaborate for 
improvement. Leadership, ideally, should come from a team - the principal and teacher leaders - and it 
should be driven by clear and specific purposes. Research shows that leadership teams united by a common 
purpose can be powerful agents of school improvement (McLaughlin & Talbert, 2006; Vescio, Rossa, & 
Adams, 2008) and that among the many improvement-focused purposes such teams can serve, focusing on 
data is an important priority (Chrispeels, Castillo, & Brown, 2000; Young, 2006). For instance, Young 
(2006), based on case studies of four schools, found that leadership effectiveness was a major variable in 
teacher buy-in and participation. Here is a description of a school with effective leadership: 

5.3 

The Hilltop principal’s vision centers on teachers’ learning about instruction as revealed in accounts of 
classroom practices and in classroom artifacts, supported by a community that holds its members accountable 
for learning. The agendas and structured activities that the principal and her leadership council establish 
for the second-grade team’s collaboration time define the data in this setting. Data for them consist both of 
what teachers reveal of their classrooms, as in war stories and student work samples, and how they measure 
progress, as in assessment results. These times also give Hilltop teachers collaborative experiences around 
data analysis that begin to build the principal’s desired norms. For example, over several meetings in which 
second-grade teachers jointly scored student writing, one reluctant team member moved from withholding 
student work, to sharing writing she had already scored on her own, to finally accepting joint grade-level 
decisions on certain samples to calibrate her scores with the team’s interpretation of the district writing 
rubric. The second-grade team is thus deepening their collaboration, their professional trust in sharing 
student work, lesson plans, and formative assessments results, and their sense of joint enterprise (p. 538). 

It is neither simple nor easy for teachers in a school to transition from roles of autonomy to teamwork and 
to face heightened expectations about using data to guide decisions. New roles require new skills and may 
bring changed routines. For leaders it requires identifying opportunities that will create staff buy-in, but 
that also promote staff learning and improved practice (Chen, Heritage, & Lee, 2005; Chrispeels et al., 2000; 
Copland, 2003). At the same time, an initiative can go badly if leaders assign staff to tasks or roles they 
perceive as unproductive or that threaten pride or professional efficacy. As described above, there are many 
ways of bringing data analysis into practice and many challenges the leader must recognize and anticipate. 

One strategy with potential is assessment benchmarking. As used here, this refers to teachers reflecting 
on and calibrating their assessment criteria and standards against a pre-established standard. Benchmark¬ 
ing can be as simple as a group of teachers discussing a particular scoring rubric and relating it to their 
individual assessment criteria and standards. Benchmarking can also be more elaborate, involving system¬ 
atic procedures to rate student work, record assessment scores, analyze results, and develop action plans. 
The case presented here shows a method to examine writing assessment scores to identify variation and 
consistency among teachers and to guide teachers’ discussion of data, writing instruction, and assessment. 
It is an approach that is feasible with typical school data and tools and that does not depend on analyses 
beyond what is reasonable to expect from professional educators. Managed well, it is a productive learning 
experience with the potential to improve practice. 
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6 Section II. Examining Writing Assessment Standards and Consistency at Gilbert 
Elementary School 

6.1 The Principal’s Concern With Excessive Variability in Writing Standards and Instruction 

“How was the conference Dean?” asked Mary Smith, a 4 th grade teacher at Gilbert elementary school. 
“Great,” replied Principal Dean Jansen, “I attended some interesting sessions... and got some good ideas for 
strengthening our writing instruction.” 

Jansen was principal of Gilbert elementary - a grades 3-5 school with 470 students from a cross section 
of backgrounds. Gilbert school’s achievement scores in writing had remained stubbornly flat for a long time 

- too long in Principal Jansen’s view. Almost half of Gilbert’s students scored “below standard” on the state 
writing assessment; he was concerned that teachers were becoming resigned to this level of performance. 
“We’ve got to turn this around” was one of his last statements at a faculty meeting before leaving for the 
conference. 

Principal Jansen attended the conference to seek strategies to promote greater instructional and grading 
consistency among the teachers in his school writing instruction. Over the last two years, based in part 
on classroom observations and in part on conversations with teachers, Jansen had become concerned about 
excessive variation among teachers in writing instruction and standards. He didn’t have objective evidence 
of this variation or that it might be something the school needs to address, but he observed considerable 
variation in writing assignments and in teachers’ grading standards. 

Jansen’s concern about writing instruction grew in part from a conversation with Ms. Smith. What he 
learned gave him a fuller understanding of the degree of contrasts in instruction among different teachers. 
Ms. Smith described her collaboration with another teacher (Jane Jones) developing lessons connecting 
writing, reading, and science. Ms. Smith described how she was teaching water cycles in ecology and 
persuasive writing in language arts; and how she combined these subjects in a project where students 
composed editorials to the newspaper about street water runoff hurting local marshes. Students did this in 
groups - researching their topics and sharing and revising drafts of their editorial. Ms. Smith described how 
she and Ms. Jones teamed with two other 5 th grade teachers, so that 5 th grade students would review the 
editorials of the 4 th graders and provide feedback before the 4 th graders’ editorial were sent to the newspaper. 
The local newspaper published several of the editorials. 

Principal Jansen knew most teachers did not do this. In fact, writing instruction in other classrooms 
typically lacked such inventiveness and cross-subject connections. In other classrooms, writing assignments 
focused more on spelling, vocabulary, and grammar worksheets, and less on actual writing. When writing 
as assigned, it was more likely to be summarizing assigned readings or responding to assigned prompts (e.g., 
“write a page about what you would do if you could fly.”) In some classrooms, not much writing was assigned 
at all. There was much variation from classroom to classroom and not much collaboration among the teachers 

- a situation not uncommon in schools (Rowan, Harrison, & Hayes, 2004; Smith, Lee, & Newmann, 2001; 
Spillane, 2004). 

Over the past year, Principal Jansen had been trying to promote more collaborative work among teachers 
and more discussions about instruction. He saw this is a major priority: strengthening the collaborative 
culture of the school. At several recent faculty meetings, he drew attention to this, saying “we can be a better 
school if we work as a team.” He wanted to see more joint curriculum planning, sharing of instructional 
strategies, and uniform academic expectations. 

Principal Jansen was aware that some teachers were not entirely comfortable with the prospect of greater 
collaboration, concerned that it meant greater scrutiny of their teaching or sitting through series of unpro¬ 
ductive meetings. Privately, many teachers believed, “what you do in your classroom is your business and 
what I do in my classroom is my business.” Jansen was concerned about this mentality, viewing it as a 
barrier to improvement (DuFour, 2011). In his view, the curriculum belonged to the school, not to each 
individual teacher. He wanted to foster among teachers more of a shared commitment to all students and 
a culture of collaboration. Jansen believed that if the school culture and curriculum were going to move in 
the direction of common standards and methods of instruction, he would need to do more that periodically 
advocate and encourage; he would need to focus teachers’ attention and discussions on evidence related to 
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practice. 

Writing instruction would be the focus. He hoped to foster more regular discussions of instructional 
strategies and grading expectations, sharing assignments and assessments, and engaging in periodic bench¬ 
marking activities to calibrate their performance expectations for students. Jansen had a number of ideas 
for productive activities and he knew others would also. He shared with the staff several articles related to 
writing assessment and instruction (Andrade, Buff, Terry, Erano, & Paolino, 2009; Gere, 2010). But the 
big activity he was going to focus on was examining and discussing data related to writing assessment and 
grading standards. So he organized a benchmarking activity to help Gilbert’s 5 th grade teachers calibrate 
their writing assessment criteria and standards. He planned to engage other grades later helped by his 
experience with this first benchmarking activity. 

6.2 Methods of the Benchmarking Activity 

The following describes methods of the benchmarking process used to help calibrate teachers’ grade level 
expectation for writing. It is a systematic way to allow teachers to compare their criteria and standards for 
assessing student writing. 

Step 1) All the 5 th grade teachers used a common writing prompt drawn from the state writing assessment 
rubric. (The writing prompt was available online.) The teachers in their individual classrooms each gave 
the writing assignment to their students, allowing about 2 hours with appropriate breaks. Each classroom 
produced about 24 essays. 

Step 2) Using the state writing assessment rubric, each teacher graded his/her students’ essays and 
recorded the scores in an Excel spreadsheet. Appendix A shows the kind of rubric used. 

Step 3) Several weeks later, two trained teachers with experience in rubric-based writing assessment 
independently scored all the papers, without knowing students’ names (each paper was given an anonymous 
ID). These two teachers had attended state hosted workshops on writing instruction and standards bast'd 
assessment and participated as raters for the state assessment. They independently scored the papers and 
then used a systematic process to give each paper a single score (the process is used for score calibration in 
rubric-based holistic writing assessment). Thus, each paper received a “benchmark score.” 

Step 4) After this process was complete, each essay paper had two scores (teacher’s score and benchmark 
score). The data set had five columns, teacher ID, student ID, teacher’s score, benchmark score, and a 
standardized test reading score (added to the data set for additional information, but not analyzed as part 
of this case). 

• Benchmark score: score from the trained assessors (1-5 [high score]; based on the rubric, 3 is 
considered “at standard” for 5 th grade). As an approximate frame of reference, each score point can 
be roughly viewed as a letter grade (5=A; 1=F). This frame of reference is helpful for giving a context 
to better interpret the range and distribution of writing scores. 

• Teacher score: score for a paper from each student’s own teacher (1- 5). 

• Reading test score: NCE score on the 5 th grade state test in reading. 

Key analyses and questions explored in the benchmarking activity include: 

• What is the distribution of student scores? How many students are below, at, and above standard 
in their writing proficiency based on the state prescribed scoring rubric? This requires a frequency 
analysis showing the raw counts and percentages of students scoring at each of the performance levels, 
which also shows how many are at or above standard and how many are not. 

• How well do the individual teachers’ scores match up with the benchmark scores? Are teachers’ ratings 
on average higher, lower, or about the same as compared with benchmark ratings? This requires 
computing classroom means of the teachers’ ratings and of the benchmark ratings and computing a 
classroom-level deviation score (numerical gap between teacher’s mean and benchmark rater’s mean 
for each classroom). 
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• How much consistency or variation in standards is there among teachers across classrooms? This 
requires, in addition to the deviation analysis above, comparing within each classroom the teacher’s 
and the benchmark rater’s scores to determine the consistency of each teacher’s scoring relative to 
the benchmark score for each student. This provides evidence of the extent to which each teacher is 
consistent in his/her application of assessment criteria from student to student. 


6.3 Analyses and Results 

Table 1 shows the number of students in each classroom. 
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Table 2 and Figure 1 show the large majority of students scored a 2 or a 3. The teachers’ scores and the 
scores of the benchmark raters differ at the two ends of the scale. The benchmark teachers’ ratings produced 
fewer 5s and more Is: the benchmark teachers rated 3 papers a “5,” whereas the classroom teachers rated 
14 papers a “5.” The classroom teachers rated 2 papers a “1,” whereas the benchmark teachers gave “Is” to 
21 papers. 
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2 http://cnx.org/content/m41218/latest/tablet.png/image 
3 http://cnx.org/content/m41218/latest/table2.png/image 
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Do classroom mean scores on teacher-graded writing correlate with classroom means based on benchmark 
scores? Table 3 reports two mean scores for each classroom: the classroom teacher’s ratings and the bench¬ 
mark ratings. The scores are sorted from highest classroom mean to lowest based on the teacher-graded 
writing scores. 

Figure 2 shows the classroom means in a scatterplot: the classroom’s benchmark score is on the X axis; 
the teacher score is on the Y axis. Each point is a classroom. 
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Table 3, and Figure 2 (the scatterplot of Table 3’s scores) show the teachers’ ratings are on average higher 
than the benchmark ratings. Table 3 shows an overall mean of 2.9 among teachers versus 2.5 for benchmark. 
The diagonal line on the scatterplot is a reference point; if teachers’ scores and the benchmark scores were 
in perfect agreement, each point would fall on this line. The further away from the line, the greater the 
deviation of the teacher’s score from the corresponding benchmark score for that classroom. 

4 http://cnx.org/content/m41218/latest/figurel.png/image 
5 http://cnx.org/content/m41218/latest/figure2.png/image 
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While the teachers’ scores tend to be higher, they are in fact correlated with the benchmark ratings - a 
moderate correlation (Pearson “r” correlation = .53). The classrooms with higher teacher ratings tend to be 
the classrooms with higher benchmark ratings which indicates classroom teachers are more or less consistent 
in applying the assessment rubric, but, generally err by scoring too high. Classrooms 1, 5, and especially 7 
(Table 3) show the biggest departures from the benchmark ratings. 

A deviation analysis is another way to summarize how well teacher-graded writing scores for individual 
students match the benchmark scores. Each student paper has a teacher-rated score and a benchmark score. 
Thus, for each paper, one can compute a “difference score.” This difference score is computed as an absolute 
value (i.e., the difference score is 1 whether the teacher score is 4 and the benchmark score is 3, or vice 
versa). Table 4 shows the frequency of occurrence of the difference scores: out of 168 papers, the teacher 
and benchmark scores matched 86 times (51% of papers); differed by one point 61 times; differed by two 
points 20 times; and by 3 points once. Thus, 87% of the time, the teacher rater and the benchmark rater 
were within at least one point of each other in their scoring. 

Table 4 
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Individual, student-level deviation scores can be aggregated to the classroom level and examined for 
individual classrooms. As described above, each paper has a teacher-rated score and a benchmark score, 
and so each paper also has a “difference score.” Table 5 shows the average, at the classroom level, of the 
difference scores. A score of zero for a classroom would show that the teacher’s scores exactly matched the 
benchmark score for each paper. (No classroom achieved this.) Classroom #3’s scores are very close to the 
benchmark scores, suggesting this teacher’s assessment standards and criteria are highly aligned with those 
of the benchmark raters. Classrooms #7 and #5 are the furthest off from the benchmarks. 

Classroom #4 is an interesting case in that this teacher’s mean rating of his/her students’ papers is very 
close to the benchmark raters’ mean (Table 3). However, the difference score is relatively large (.73, as shown 
in Table 5). Thus, even though the means are similar, this teacher’s ratings differ often in both directions 
from the benchmark ratings, showing this teacher is not very consistent in applying the rubric. This shows 
why it is important not just to compare means, but also to compare the difference scores. 

6 http://cnx.org/content/m41218/latest/table3.png/image 

7 http://cnx.org/content/m41218/latest/table4.png/image 
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Table 6 shows the scores (teacher-graded and benchmarks) from classroom #7 -the classroom with the 
largest difference scores. The scores are organized by the size of the gap between the benchmark rater’s score 
for each paper and the teacher’s score for each paper. The shading shows visually the extent of benchmark 
rated v. teacher rated score differences among the 24 classroom papers. The darker the shading, the greater 
the disparity between the teacher’s score and the benchmark score. The teacher in classroom #7 is not 
consistent in applying the rubric. This suggests s/he does not have a clear understanding of the rubric-based 
criteria and standards for assessing student papers. 

Table 6 
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7 Section III. Discussion Questions and Assignments 

7.1 Questions and Exercises Related to Section I. 

(Exercise #1) There are surveys and rubrics available on the web for assessing school culture and collaborative 
practices among teachers. Below are web links to a few. Your own school or district may use a survey. Find 
a survey and select about eight items to capture the main dimensions of a well-functioning collaborative 
group and the attributes of school culture that support it. Apply those selected items to your own work 
situation (in a school) or, if you don’t work in a school, try and arrange a meeting with a working teacher 
and review and discuss the selected items (how that teacher views the culture if his/her school with respect 
to the particular dimensions reflected in the survey items). Record your results and explain whether the 
ratings you observe are satisfactory or whether practice should attempt to exhibit greater collaboration. 
Compare your results to those of others doing this same exercise. If there are notable differences between 
results, discuss what might be the reason for the different results. 

• http://schoolreforminitiative.org/protocol/doc/plc_survey.pdf 10 

• http://files.solution-tree.com/pdfs/Reproducibles_BPLC/midyearplcsurvey.pdf * 11 

(Exercise #2) Imagine you are the principal of a school with staff and working conditions similar to Gilbert 
Elementary School - that is, a staff that is quite varied in seniority, working habits, talents, and levels of 

8 http://cnx.org/content/m41218/latest/tables.png/image 
9 http://cnx.org/content/m41218/latest/table6.png/image 
10 http://schoolreforminitiative.org/protocol/doc/plc_survey.pdf 

11 http://files.solution-tree.com/pdfs/Reproducibles_BPLC/midyearplcsurvey.pdf 
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enthusiasm for greater collaboration. You want more collaboration on curriculum planning, instructional 
strategies, and peer mentoring. Develop either a 400 word memo or a 300 word speech that would be the 
first communication to the staff about your perception of the need for change. Whatever else your message 
includes, provide at least three reasons to justify your position. Your message should anticipate responses of 
skeptical staff members who will wonder - “why should we do this? what’s wrong with the way things are 
now?” 

7.2 Questions and Exercises Related to Section II. 

(Discussion question #1) The average scores of the teacher-graded papers are .4 higher than the averages 
of the benchmark papers (Table 4), with much of this gap coming from three classrooms. Do these results 
indicate standards for assessing and grading writing need to be raised? If so, what is the basis for your 
conclusion? 

(Discussion question #2) Suppose someone claimed that the writing assessment results actually under¬ 
state the true range of writing proficiency in typical classrooms in the building. The person claims that 
in reality the true range is even greater than the ratings suggest, and that if there was a different kind of 
writing task (prompt) and a more elaborate rubric, the results would show a bigger disparity between the 
high level writers and the low level writers. How would you investigate this possibility? Do you think that 
distribution of scores in a classroom might be different with a different kind of writing prompt or scoring 
rubric? 

(Discussion question #3) The average rating from teacher Jones of the student papers in his classroom 
is the same as the average rating of those same papers from the benchmark rater. Does this mean teacher 
Jones and the benchmark rater are in agreement in their application of the rubric’s criteria and scales? Give 
an example to illustrate your point. 

(Exercise #1) Table 6 shows the classroom level deviation scores. Assume you are the principal and you 
have the data for these tables. Decide how you would communicate this information to the teachers and 
what you would do on the basis of the information. 

(Exercise #2) Develop a specific plan for a 4-hour workshop that would follow after the benchmarking 
activity described above. This plan should be targeted at an identified grade level. From the literature, 
identify three or four excellent readings you would assign to workshop participants to prepare. 

(Exercise #3) Propose a new and different analysis using additional variables and exploring different 
questions. The data set used for the benchmarking activity is described in Step 4 of Section II. In addition 
to the data and variables described in Step 4 above, assume these variables are also part of the data set: 
marking period grades, gender, race, special education classification, and free-lunch eligibility. Also, you 
may propose adding additional variables if you have a specific inquiry in mind that could be done with data 
collection. 

The proposal should begin with a “problem statement” - that is, state the issue, need, or concern that 
motivates doing the analysis and how the information sought can help address the concern. As with the 
above case’s analyses, do not imply that your proposed analyses will provide conclusive evidence; rather, the 
objective is better information, knowing better key outcomes in the program, and more informed planning 
and decisions. The proposal should explain the analyses to be conducted, much like the explanations in 
Section II. above, and it should offer preliminary thoughts on what you would deduce pending different 
findings. For instance, “A strong correlation between [name of variable] and [name of variable] would be a 
cause for concern because....” “If a strong correlation is observed, this would invite further inquiry into....” 

8 Section IV. Notes for Instructor Related to Discussion Questions and Assign¬ 
ments 

(Comments on Exercise jf-2 , Section I.) Theoretical justification for PLCs: (a) strengthen teachers’ sense of 
ownership over curriculum and professional development; (b) improve quality of decision-making by pooling 
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expertise; (c) raise level of accountability to colleagues. For more information, see DuFour, DuFour, and 
Eaker (2008), Kilbane (2009), McLaughlin and Talbert (2006), and Mullen & Hutinger (2008). 

(Comments on Discussion question #1, Section II.) That academic expectations in writing in some 
classrooms (notably, classrooms 1, 5, and 7) need to be higher is a reasonable conclusion. While the evidence 
from the benchmarking exercise alone does not constitute incontrovertible proof, the evidence is definitely 
strong enough to warrant concern about a gap between what the three teachers view as “at standard” writing 
and the level of proficiency prescribed in state standards as specified in rubrics. It should be emphasized 
that these data - the evidence generated from an exercise like this - must be interpreted cautiously and 
discourse among participations should use appropriately qualified language. The evidence should stimulate 
further discussion about writing instruction, assignments, expectations, and grading. 

(Comments on Discussion question #2, Section II.) Some experts believe that standardized writing 
assessments (like the one described here) may have the effect of constraining the range of performance. 
Imagine, for instance, if the students of an average grade class of 5th were given 24 hours to write an 800 
word evidence-based argument on a particular topic (some issue of relevance to 5th graders). The low end 
papers would not look a lot different from the low end papers in a standardized writing assessment, but 
the high end papers would be sophisticated essays - fully developed arguments with evidence, definitions, 
explication of assumptions, examples, and possibly even rebuttals to counter positions. Thus, the range 
of papers from worst to best would grow because the most able and motivated students would not be 
constrained by a relatively short time limit and by the five paragraph structure imposed by standardized 
writing assessment rubrics. Being less constrained, the top students would have more freedom for reading, 
writing, revising and creating their own argument. So in this sense, relative to the standardized “one period” 
writing assessment, the observed range in quality of papers submitted would grow. However, what this 
means is less clear: the 24 hour essay allows more room for other attributes to factor into performance, 
namely motivation, substantive background knowledge, and information processing skill that arguably fall 
outside the domain of writing. This raises the question, then, would an assessment based on the 24 hour 
essay be just a measure of writing, or is it assessing some broader construct. There is no simple answer to 
this question because it depends on how we think of and define writing. While there is no simple answer, it 
is instructive to contemplate this question and examine our own conceptions of the construct, “writing.” 

(Comments on Discussion question fj-3, Section II.) The answer to the question is “No,” the two averages 
being similar does not mean the teacher and the benchmark rater are consistent with each other in how they 
apply the rubric. It is essential to compare not just the mean scores of one classroom to another, but also to 
examine and compare deviation scores. For example, the teacher could grade three papers 5, 3, and 1, while 
the benchmark scores for those same three papers are 1, 3, and 5. Both sets of ratings have an average of 
3.0, but, clearly, the teacher is not interpreting the rubric in the same way as the benchmark rater. 

(Comments on Exercise /)1, Section II.) The classroom level (not student level) deviation scores could be 
shown to teachers either as a group in a meeting or individually in one-on-one conferences; and, if in a group, 
can be viewed either with teachers identified or not. The leader would want to consider teachers’ level of 
concern about being identified and about whether there would be excessive discomfort in openly discussing 
individual assessment results. In some schools this would not be a problem, but if a school’s leadership and 
teachers are not practiced with such discussions, it may be better to design the process to avoid inter-teacher 
comparisons. For instance, code numbers could be substituted for teachers’ names in viewing tables showing 
the full set of classroom results, with later individual conversations to discuss individual results. Ideally, 
professionals should be comfortable with discussing practice and be accountable to supervisors, colleagues, 
and clients for performance outcomes; the reality is that culture and practice in many schools does not reflect 
this ideal. 

(Comments on Exercise #2 , Section II.) Suggested workshop activity: Collaborative grading of papers 
can be a very productive activity if well planned, organized, and led by an experienced workshop leader 
with expertise in writing instruction and assessment. The activity involves reading and discussing a range of 
student papers, discussing the paper’s strengths and weaknesses, comparing the attributes of weak papers to 
better ones, and individual participants explaining to others their grading criteria with examples from papers 
illustrating these criteria. The workshop can include mini-benchmarking sessions with a small number of 
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papers (i.e., individual reading and rating of papers followed by comparing and discussing results - discussions 
should be connected with local or state documents containing approved writing standards). These sessions 
would be well served by creating tables like Table 7 so participants can examine deviation scores for ratings 
on individual papers and discuss disparities and outlier scores among the participants. The workshop should 
culminate with a focus on instructional practice to improve the writing. 

(Comments on Exercise #3, Section II.) Many additional analyses are possible. Here are a few examples: 
(a) Probe more deeply into “error” patterns (deviation scores) in teachers’ application of the assessment 
rubric. (Table 7 shows a deviation analysis for one teacher.) Are teachers more likely to be off randomly or in 
predictable ways? (b) How big are the differences in performance scores (writing, test scores, grades) among 
the different demographic groups and do these gaps differ depending upon the measure? (c) Do students 
with higher standardized test scores also do better on the writing assessment? Get better grades? How 
strong is the correlation? Do relationships between achievement variables vary by demographic category? 

Table " 

Rubric Jbr Exercize #3. Section II: PropozaiforAddutmai Anaiyzez 


Dimenzionz 1—Vncxcepabk 


Problem identified is not 

Problem Statement specific: hard to undersand; 

significance notclear. 


Purpose 


Data Discretion 


Analytical Srategy 


The psrpose of the analysis is 
not explained. It is not clear 
what the purpose is; why (he 
analysis is being done. 

Little or no information 
provided on data: cases, 
variables, measures are not 
clear 

No analytical strategy is 
evident or the srategy 
presented is inappropriate or 
difficult to understand. There 
is Httfe or no clear inferential 
logic to the design of the 
analyses. The analysis will 
not yield useful information. 


Commend on Possible 
Findings orDecisions 


Little or no discussion of 
possible findings or 
implications. 


The proposal is not organised 
well Paragraph coherence is 
Overall Coherence and frequently weak; many 
Presentation sentences are unclear; too 

many sections are difficult to 
follow. 


2 =Adequate 3=Target 

Problem identified is relatively 

specific, though may hate some Problem identified is specific; 
unclear aspects; significance is significance is compellingly 
mentioned but not entirely established, 

compelling. 

There is a purpose statement and The purpose statement is clear and 
it is generally clear though not as detailed in explaining why the 

specific as it could be. It gives analysis is needed and how the 

some attention to why the analysis wit help planning and 

analysis is needed. decisionmaking. 


Itf ormation is provided to 
explain the variables being 
examined. 


Complete information describes 
cases, variables, and measures. 


The analytical strategy is 
generaty appropriate. Elements 
of the logic may be unclear in 
places with some additional 
justification needed, but overall 
the analyses are justified and 
will produce useful evidence. 

Possible findings and 
implications are considered, 
though may be somewhat 
sketchy; few or no 
qua Hfic ations limitations 
offered. 

The proposal is generaty well 
organized and clearly presented, 
though some sections have 
weaknesses in clarity, 
concisions, and or coherence. 


There is a clear, logical, and well- 
justified strategy. Thestraegyis 
appropriate for the questions asked 
and will provide highly relevant, 
credible evidence t> clarify issues 
and inform plans and decisions. 

Possible findings and implications 
are considered in some deEiland 
well thought out Appropriate 
qualifications and or limitations 
are offered. 

The proposal is consistently well 
organiced and clearly presented. 
The prose iscfear, concise, and 
well organized. 
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